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Abstract. We study notions of robustness of Markov kernels and probability dis- 
tribution of a system that is described by n input random variables and one output 
f*"^ ■ random variable. Markov kernels can be expanded in a series of potentials that allow 

fN| ' to describe the system's behaviour after knockouts. Robustness imposes structural 

constraints on these potentials. 
O ■ Robustness of probability distributions is defined via conditional independence 

' statements. These statements can be studied algebraically. The corresponding con- 

ditional independence ideals are related to binary edge ideals. The set of robust 
probability distributions lies on an algebraic variety. We compute a Grobner basis 
of this ideal and study the irreducible decomposition of the variety. These algebraic 
^) ■ results allow to parametrize the set of all robust probability distributions. 

< 

' 1- Introduction 

In this article we study a notion of robustness with tools from algebraic geometry. 
This work has been initiated in (TJ. Connections to algebraic geometry have already 
^ • been addressed in (6). We consider n input nodes, denoted by 1,2, ... ,n, and one 

output node, denoted by 0. For each i = 0, 1 , . . . , n the state of node i is a discrete 
random variable X, taking values in the finite set Xi of cardinality d\. The joint state 
space is the set X = Xq x X\ x • • • x X n . For any subset S c {0, . . . , n] write X$ for the 
random vector (Xi)i e $ ; then X$ is a random variable with values in Xs = Xies^i- For 
any x e X, the restriction of x to a subset S c {0, ...,«} is the vector x\s e Xs with 
(x\s )i = Xi for all i e S . 

We study two possible models for the computation of the output from the input: 
The first model is a stochastic map (Markov kernel) k from X[„\ to A"o> that is, k is a 
^ , function 

H i K:X [n] xX -> [0,1], (x,y) ^ K{x;y), 



satisfying zZveX K ( x '>y) = 1 f° r ai l x - The second model is a joint probability distribu- 
tion p of the random vector (Xo,X[„j). These two models are related as follows: The 
joint probability distribution p of (Xo,X[ n ]) can be factorized as 

p(y, x) = p(y\x)p in (x), for all (y, x) e X, 

where p m is the distribution of the input nodes and p{y\x) is a conditional distribution, 
which need not be unique. Each possible choice of this conditional distribution defines 
a Markov kernel tc(x;y) := p(y\x). Conversely, when a Markov kernel k is given, 
then any input distribution p m {x) defines a joint distribution p{x,y) = p m {x)K(x;y). 
The result of our analysis will not depend too much on the precise form of the input 
distribution; it will turn out that only the support supp(/?i n ) := {x e X : p m {x) > 0} 
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is important. Similarly, in the analysis of the kernels, there will also be a set S of 
"relevant inputs " that will play an important role. 

We study robustness with respect to knockouts of some of the input nodes [n] in 
both models. When a subset S of the input nodes is knocked out, and only the nodes 
in R = [n] \ S remain, then the behaviour of the system changes. Without further 
assumptions, the post-knockout function is not determined by k and has to be specified. 
We therefore consider a further stochastic map kr : Xr x Xo — » [0, 1] as model of 
the post-knockout function. A complete specification of the function is given by the 
family (/^Oacm of all possible post-knockout functions, which we refer to as functional 
modalities. As a shorthand notation we denote functional modalities as (ka)- The 
Markov kernel k itself, which describes the normal behaviour of the system without 
knockouts, can be identified with K[ n y 

What does it mean for a stochastic map to be robust? Assume that the input is in 
state x, and that we knock out a set S of inputs. Denoting the remaining set of inputs 
by R, we say that (ka) is robust in x = (xr, x$ ) against knockout of S , if 

(1) k(x r , x s ; xo) - k r (x r ; x ) for all x Q £ X ■ 

If H is a collection of subsets of [n] and if (ka) is robust in x against knockout of 
[n] \ R for all Reft, then we say that (ka) is %-robust in x. In Section [2 we consider 
Gibbs representations of functional modalities and derive structural constraints on cor- 
responding interaction potentials that are imposed by robustness properties. These 
constraints do not depend on the configuration x in which the functional modalities are 
assumed to be robust. 

Similar to the case of Markov kernels, the joint probability distribution p does not 
allow to predict the behaviour of a perturbed system. Nevertheless, we can ask whether 
it is at all possible that the behaviour of the system is robust against a given knockout 
of S . Let pi n be an input distribution, and let (ka) be the functional modalities of the 
system. If (ka) is robust against knockout of S in x for all x e supp(p; n ), then Xo 
is stochastically independent from Xs given Xr (with respect to the joint probability 
distribution p(xo, x m ) = Pm(xin) K ( x 'm', xq)), where R = [n] \S , a fact that will be denoted 
by Xq JI Xs | Xr . In order to see this, assume x = (xr, x$) £ supp(/?i n ). Then 

P(X Q \XR,X S ) = k(Xr,X S ;Xq) 

= kr(x r ;x q ) ^ p(x' s \x R ) 

x' s :(x R ,x' s )esupp(p iu ) 

^ p(x' s \xr)k{xr,x' s ;x ) 

x' s :(x R ,x' s )esupp(p m ) 

p{x' s \xR)p{x \x R ,x' s ) 

x' s :(x R ,x' s )esupp(p in ) 
= p(x Q I Xr) . 

On the other hand, if Xo Ji Xs \ Xr holds for a joint distribution p, then any family 
(ka) with the property that ka{xa, xq) - p(xq\xa) whenever p(xa) > is robust against 
knockout of S for all x e supp(p m ), where p m is the marginal input distribution. 

Therefore, we call the joint probability distribution p robust against knockout of 
5, if Xo 11 Xs \Xr . This means that we do not lose information about the output Xo, 
if the subset S of the inputs is unknown or hidden (or "knocked out"). Probability 



ROBUSTNESS AND CONDITIONAL INDEPENDENCE IDEALS 



3 



distributions that are robust in this sense are studied in Section |3j Section @] discusses 
the case that Xq is a deterministic function of the input nodes. The symmetric case that 
p is robust against knockout of any set S of cardinality less than n - k is studied in 
Section [5] 

The results about robustness are derived from an algebraic theory of generalized 
binomial edge ideals, which generalize the binomial edge ideals of (6l and This 
theory is presented in Section [6] A Grobner basis is constructed, and it is shown that 
these ideals are radical. Finally, a primary decomposition is computed. Similar CI 
statements have recently been studied in ifTTTl . That work discusses what is called 
(n - l)-robustness in Section[5] 

2. Robustness of Markov kernels 

Let (/ca)ac[«] b e a collection of functional modalities, as defined in the introduc- 
tion. Instead of providing a list of all functional modes ka, one can describe them in 
more mechanistic terms. In order to illustrate this, we first consider an example which 
comes from the field of neural networks. In that example, we assume that the output 
node receives an input x - (x\,...,x n ) e {-1, +1}" and generates the output +1 with 
probability 

(2) K(xi,...,x n ;+l) := L - - , 

I + g-2i i= i WjXi 

which implies that for an arbitrary output jco 



k(xi,,..,x„;xo) :- 



This representation of the stochastic map k has a structure that allows inferring the 
function after a knockout of a set S of input nodes, by simply removing the contribu- 
tion of all the nodes in S . In our example (O, the post-knockout function is then given 
as 

1 

k r (x r ;+\) :- 



where R = [n] \ S. This inference of the post-knockout function is based on the 
decomposition of the sum that appears in ©. Such a decomposition is referred to as 
a Gibbs representation of k and contains more information than k. More generally, we 
consider the following model of (ka) 

gZscA <Pb(xb,xq) 

(3) k a (x a ;x ) = — — — -, 

V , e LBQA<PB(XB,X ) 
ZuX Q 

where the 0# are functions on Xb X Xq. Clearly, each ka is strictly positive. Using 
the Mobius inversion, it is easy to see that each strictly positive family (ka) has the 
representation (O. To this end, we simply set 

(4) <M*a,*o) : = X/- 1)IMC ' ln * c(xc;Xo) - 

CCA 

Note that this representation is not unique: If an arbitrary function of xa is added to 
the function cpA, then (O does not change. 

A single robustness constraint has the following consequences for the <pA- 
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Proposition 1. Let S c [n] and R = [n] \ S, and let (ka) be strictly positive functional 
modalities with Gibbs potentials ((Pa)- Then (ka) is robust in x against knockout of S 
if and only ifYiBc\n\,B$R <Pb(x\b, xq) does not depend on xq. 

Proof. Denote <j>A the potentials defined via (@]). Then (Q} is equivalent to 
^ (/>b(x\b,xq) = 2^ <Pb(x\ b ,xq) <=> 2^ <Pb(x b ,xq) - 0. 

Bc[n] BCR Bc[n] 

BtR 

The statement follows from the fact that cp B (x\ B ; x ) - 4> B (x\ B ; x q) is independent of xq 
(for fixed x). □ 

Does "^-robustness in x imply any structural constraints on (k a )? In order to answer 
this question, we restrict attention to the case % = := {R c [n] : \R\ > k}. 

If (ka) is 7?* -robust on a set S, then the corresponding conditions imposed by Propo- 
sition [T] depend on S. In this section, we are interested in conditions that are indepen- 
dent of S. Such conditions allow to define sets of functional modalities that contain all 
7?* -robust functional modalities for all possible sets S. If the set S (which will be the 
support of the input distribution in Section[3]) is unknown from the beginning, then the 
system can choose its policy within such a restricted set of functional modalities. 

Denote Kk the set of all functional modalities (ka) such that there exist potentials (pA 
of the form 



a (*a;*o) = ^ ^bAxb; xq), 



bqa 

\B\<k 

where ^b,a is an arbitrary function R^x^o The se t f[ k j s called the family 

of k-interaction functional modalities. It contains the subset Kk of those functional 
modalities (ka) where the functions ^ additionally satisfy 

(-l) |A|v P B ^(x B ; xq) = (-1) WIx ¥ba'(xb; xq), whenever B c A n A' and < k, 
and 

lA'l-* f_Y\W\-l w ~ k (_\\\A\-i 

Z (L\ VbA'b; xo) = 2 ' VbA*b; xo), if B c A n A' and |5| = *, 
i=0 ( k ) 1=0 \k) 

for all xb e As and xq e Xq. Both and only contain strictly positive kernels. 
Therefore, we are also interested in the resepective closures of these two families with 
respect to the usual real topology on the space of matrices. 
The following holds: 

Proposition 2. Let S be a subset ofX[ n ] and let (ka) be functional modalities that are 
Kk-robust in xfor all ie5. Then there exist functional modalities (ka) in the closure 
of Kk such that ka(x\a) - KA(x\A)for all A and all x € S. In particular, ka belongs to 
the closure of the family of k-inter actions. 



ROBUSTNESS AND CONDITIONAL INDEPENDENCE IDEALS 



5 



Proof. Assume first that k a is strictly positive. Define Gibbs potentials using the 
Mobius inversion ((U). Note that 

£(-l)l A \ c llnK C (x c ;x ) = ^(-l) |A \ c| -^2 ln * c( * c; * o) 

CCA CCA ( k ) BQC 

\C\>k \C\>k \B\=k 

CcA \k) BCC 

Ce^ \B\=k 

BcA \RQA\B \ k ) I 

\B\=k V V ' ' 

Together with (0]) this gives 

4> A (x A ,x Q ) - ^ a A ,c In k c {x c ; x ) , 

CCA 
\Q<k 

where 

[(-1) |A H C| , if\C\<k 
&A ' C = W-D |AHRR if ICI = Jfc 

depends only on the cardinalities of A and C. The statement follows with the choice 
^CaOcs-xo) = (*a,c In k c (x c ;x q ). 

If (ka) is not strictly positive, then define Aa(xa; xq) = jt- for all A c [«]. Then the 
functional modalities (/Ia) are ^-robust for all x € S, and so are the strictly positive 
functional modalities (k^) defined via k a - (1 - €)ka + e^A- The statement follows 
from lim e _>o k £ a - ka- □ 

Example 3. Consider the case of n - 2 binary inputs, X\ = X 2 = {0, 1}, and let 
S = {(0, 0), (1, 1)}. Then H\ -robustness on S means 

K{l|(X[;x ) = K{1,2|(*1,*2; *0) = K{2){X2\Xq) 

for all xo whenever x\ = x 2 . By Proposition [TJ this translates into the conditions 

(5) (/>{\,2\(x\,X 2 ;Xq) + (f>{l}(X[-,X ) = = <p{l,2}(xi,X 2 ;x ) + (f>{2\(x 2 ; X Q ) 

for all xo whenever x\ = x 2 for the potentials (0a) defined via (01). This means: As- 
suming that (ka) is Hi -robust, it suffices to specify the four functions 

4>$(x ), <P{\}(xu x Q ), 0{i,2}(0, 1; xo), 0{i,2}(l, 0; x ). 

The remaining potentials can be deduced from (f5]). If only the values of (k a ) for x e 5 
are needed, then it suffices to specify <P®{xq) and 0ji}(xi; xo). 

Even though the families K\ and Kk do not depend on the set S, the choice of the 
set S is essential: If the set S is too large, then the conditions ([TJ) imply that the output 
Xo is (unconditionally) independent of all inputs. The theory developed in Sections [3] 
to[5]discusses the constraints on conditionals imposed by the choice of S. In particular, 
Section |4] gives bounds on the strength of the interaction between the input nodes and 
the output node for given K and S. 
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On the other hand, since and are independent of S, Proposition |2] shows that 
these two families can be used to construct robust systems, when the input distribution 
p m is not known a priori (or may change over time) but must be learned by the system. 

3. Robustness and conditional independence 

We now study robustness of the joint distribution p of (Xo,X[„]). As stated in the 
introduction, p is called robust against knockout ofS if it satisfies Xo Ji X$ \ Xr , where 
R - [n] \ S . By definition this means that 

(6) p{x Q , x s , x R )p(x' Q , x' s , x R ) = p(x Q , x' s , x R )p(x' , x s , Xr), 

for all xo, x' Q € Xq, xs,x' s e Xs and xr € Xr. Here, p(xq, xs,xr) is an abbreviation of 
P(Xq = xq, Xs - xs,Xr - xr). It is not difficult to see that this definition is equivalent 
to the usual definition of conditional independence Q. This algebraic formulation 
makes it possible to study conditional independence with algebraic tools. 

In order to formulate the results in higher generality, we will also consider CI state- 
ments of the form Xo Ji Xs \ Xr = y for some S Q [n], R = [n] \ S and y e Xr. By 
definition, this is equivalent to equations © for all xq, x' q e Xq, xs,x' s e Xs and xr = y. 
Such a statement models the case that, if the value of the input variables Xr is y, then 
the system does not need to know the remaining variables Xs in order to compute 
its output. Such CI statements naturally generalize canalizing [8j] or nested canaliz- 
ing functions Q, which have been studied in the context of robustness. The simpler 
statement Xo Ji X^ | Xr corresponds to the special case where Xo Ji Xs | Xr = y for 
all y e X R . 

Let % be a collection of pairs (R,y), where R c [«] and y e Xr. Such a collection 
will be called a robustness specification in the following. A joint distribution is called 
%-robust if it satisfies all conditional independence (CI) statements 

(V) X iLX [n] \ R \X R = y 

for all (R,y) € K. We denote P-r the set of all ^-robust probability distributions. 

Example 4. As before, let ^ be the set of subsets of [n] of cardinality k or greater. In 
other words, a probability measure p is ^-robust, if we can knock out any n-k input 
variables without losing information on the output. 

Equations © are polynomial equations in the elementary probabilities. They are 
related to the binomial edge ideals introduced in |6]. The generalized binomial edge 
ideals will be studied in Section [6] Here, we interpret the algebraic results from the 
point of view of robustness. 

Let X = X\ X • • • X X n . A robustness specification K induces a graph G<r on X, 
where x, x' e X are connected by an edge if and only if there exists (R, y) e H such 
that the restrictions of x and x' to R satisfy x\r = x'\r = y. 

Definition 5. Let J/ c X, and denote G^y the subgraph of Gr induced by J/. The 
set y is called %-connected if G% ^ is connected. The set of connected components 
of G<R y is called a H-robustness structure. An "^-robustness structure B is maximal if 
and only if UB := U^eB^ satisfies any of the following equivalent conditions: 

(1) For any x e X \ UB there are edges (x,y), (x,z) in G<r such that j,ze UB are 
not connected in G^uB- 
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(2) For any x € X \ UB the induced subgraph G^uBufr] has less connected compo- 
nents than G^ub- 

For any probability distribution on X, xq e Xo and x e X denote p x the vector 
with components p x (xq) = p(Xo = xo,X[„] = x). Denote supp p := {x e X : p A ^ 0}. 
For any family B of subsets of X let be the set of probability distributions p that 
satisfy the following two conditions: 

(1) supp p = UB, 

(2) p x and p y are proportional, whenever there exists ZeB such that ijeZ 

It follows from (fTOl) and Theorem [23] that IP?? equals the disjoint union Ub!Pb, where 
the union is over all ^-robustness structures. Alternatively, Pr equals the union Ub!Pb> 
where the union is over all maximal ^-robustness structures. 

For any x e X the vector p x is proportional to the conditional probability distribution 
P('\X[n] - x ) °f given that X[„] - x. Hence: 

Lemma 6. Let p be a probability distribution, and let B be the set of connected com- 
ponents of G^.supp p. Then p is 'R-robust if and only if P(-\X[„] = x) = F(-|X[ n ] = y) 
whenever there exists 2 e B such that x,y e Z- 

The following lemma sheds light on the structure of Pq: 

Lemma 7. Fix an K-robustness structure B. Then Pr consists of all probability mea- 
sures of the form 



where p. is a probability distribution on B and Az is a probability distribution on Z 
for each Z e B and (pz)zsR i J a family of probability distributions on Xq. 

Proof. It is easy to see that p is indeed a probability distribution. By Lemma [6] it 
belongs to P%. In the other direction, any probability measure can be written as a 
product 

p(x , x\, . . . , x n ) - p(Z)p {x\ , . . . , x n \(Xi ,...,X n )eZ) p(xq\x\ x n ), 

if (jci, . . . , x n ) eZeB, and if p is an ^-robust probability distribution, then pz(xo) '■= 
P(xq\x\ ,...,x n ) depends only on the block Z in which (xi, . . . , x n ) lies. □ 



The factorization in Lemma |7] admits the following interpretation: 

Proposition 8. Let B be an K-robustness structure. Then the set Kg is the set of 
probability distributions such that 



(8) 




4. Robust functions 



p(X [n] 6 UB) = 1 



and 



Xq ii X[„] | X[„] e Z 



for all ZeB. 
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In other words, the sets ZeB determine a partition of the set supp p, which consists 
of all outcomes of X[ n ] with non-zero probability under p. Within each block Z the 
value of Xo is independent of all inputs. Let R Q [n], and let x,x' 6 K[n\ satisfying 
(R, x\r) e R and (R, x'\r) e R. If x and x' belong to different blocks in B, then x\r + x'\r. 
Therefore, the knowledge of the input variables in R is sufficient to determine in which 
block ZsBwe are. 

When p or B is fixed we can introduce an additional random variable B that takes 
values in B. The situation is illustrated by the following graph: 



X\ X 2 Xt, • ■ • Xt, 

The aiTows from the input variables X\, . . . ,X n to B are, in fact, deterministic: 

B{x) = Z ifieZeB. 

Note, however, that the function B is only defined uniquely on UB, which is a set of 
measure one with respect to p. This means that in many cases it is enough to study 
robustness of functions on X. 

Definition 9. A function / denned on a subset S c X[ n ] is ^-robust if there exists an 
■^-robustness structure B such that S = UB and / is constant on each SeB. 

There are two motivations for looking at this kind of functions: First, they occur 
in the special case of ^-robust probability distributions p{X^,X\, . . . ,X n ) such that all 
conditional probability distributions p(Xq\x\,...,x„) are Dirac measure. Second, as 
motivated above, we can associate to any ^-robust probability distribution p a corre- 
sponding function / characterizing the ^-robustness structure. In order to reconstruct 
p it is enough to specify the input distribution pi n (Xi, . . .,X n ) and a set of output distri- 
butions {p{Xo\{X\, . . . ,X n ) e Z,)\zeB i n addition to the function / : S — » B. Note that 
natural examples of robust functions arise from the study of canalizing functions 0|71. 

It is natural to ask the following question: Given a certain robustness structure, how 
much freedom is left to choose a robust function /? More precisely, how large can the 
image of / be? Equivalently, how many components can an 'K-robustness structure B 
have? 

Lemma 10. Let f be an R-robust function. The cardinality of the image of f is 
bounded from above by 



Proof. Suppose without loss of generality that ({1, . . . , r},y) e % for all y € X[ r ] and 
that d\ . . .d r equals the above minimum. The image of / cannot be larger than d\ . . . d,-, 
since if we knock out all X, for i > r, then we can only determine d\ . . . d r states. □ 
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Example 11. Suppose that S = X. This means that the ^-robustness structure satisfies 
UB = X. We first consider the case that Gn is connected. This is fulfilled, for example, 
if for any k e [n] there exists R c [n] such that k £ S and (R,y) € <R for all y € Xr. In 
this case an ^-robust function / takes only one value. 

Assume that (R,y) e H implies (/?,/) e "R for all / e If is not connected, 
then some input variables may never be knocked out. Let T be the set of these input 
variables. For every fixed value of Xj the function / must be constant. This means 
that / can have Ilie[n]\r di different values. 

Remark (Relation to coding theory). We can interpret X as a set of words over the 
alphabet [d m ] of length n, where d,„ = max{<i,}. For simplicity assume that all d\ 
are equal. Consider the uniform case % = R.^. Then the task is to find a collection 
of subsets such that any two different subsets have Hamming distance at least k. A 
related problem appears in coding theory: A code is a subset J/ of X and corresponds 
to the case that each element of B is a singleton. If distinct elements of the code have 
Hamming distance at least n-k, then a message can be reliably decoded even if only 
k letters are transmitted correctly. 

5. 'Rfc-ROBUSTNESS 

In this section we consider the symmetric case "R = We fix n and replace any 
prefix or subscript "R by k. 

Let k = 0. Any pair (x,y) is an edge in Go. This means that B can contain only one 
set B. There is only one maximal 0-robustness structure, namely B = [X[ n ] }. The set 
*Rq is irreducible. This corresponds to the fact that P„ is defined by Xq 11 X[„] . 

B is actually a maximal ^-robustness structure for any < k < n. This illustrates 
the fact that the single CI statement Xo ii X[„] implies all other CI statements of the 
form ©. The corresponding set P-g contains all probability distributions of Po of full 
support. 

Now let A: — 1. In the case n = 2, we obtain results by Alexander Fink, which can 
be reformulated as follows (H: Let n = 2. A \-robustness structure B is maximal if 
and only if the following statements hold: 

• Each B e B is of the form B = S i x 52, where Si £ Xi,S2 c X2- 

• For every x\ e X\ there exists B £ B and X2 e X2 such that (jq, X2) 6 B, and 
conversely. 

In a different description is given: The block S \ x S2 can be identified with 
the complete bipartite graph on Si and S2. In this way, every maximal 1 -robustness 
structure corresponds to a collection of complete bipartite subgraphs with vertices in 
Xi U X2 such that every vertex in X\ resp. X2 is part of one such subgraph. 

This result generalizes in the following way: 

Lemma 12. A l-robustness structure B is maximal if and only if the following state- 
ments hold: 

• Each B e B is of the form B - S 1 x • • • x S,„ where Si c Xj. 

• Fix j e [n] and X{ € X[ for all i e [n], i + j. Then there exist xj e Xj 
such that {x\, . . . ,x n ) € UgehB. In other words, whenever n — \ compo- 
nents of(x\,... ,x n ) are prescribed, there exist an n-th component such that 
(xi,...,x n ) 6 U BeB 5- 
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Proof. We say that a subset J/ of X is connected if G<R,y is connected. Suppose that B is 
maximal. Let BeB and let S , be the projection of B c X[ n ] to Xj. Let B' = 5 1 x- • - xS„. 
Then B c B'. We claim that (B \ {B}) U {B'} is another coarser 1-robustness structure. 
By Definition [5] we need to show that B' is connected and that A U B' is not connected 
for all A € B \ {B}. The first condition follows from the fact that B is connected. For 
the second condition assume to the contrary that there are x e B' and y e A such that 
x = (x\, . . . , x n ) and y = (y\, . . . ,y„) disagree in at most n - 1 components. Then there 
exists a common component x/ = y\. By construction there exists z = (zi, . . . ,Z n ) e B 
such that zi - yi - xi, hence A U B is connected, in contradiction to the assumptions. 
This shows that each B has a product structure. 

Write B = Sf x • • • xSf for each BeB. Obviously Sf n Sf = for all i e [n] and 
all B, B' e B if B + B' . The second assertion claims that U^B^f = Xj for all i e [n]: 
Assume to the contrary that / e Xi is contained in no Sf. Take any B and define 
B' := 5f X • • • X (Sf U {/}) X • • • X S^. Then (B \ B) U {B'} is a coarser 1-robustness 
structure. 

Now assume that B is a 1-robustness structure satisfying the two assertions of the 
theorem. For any x e X \ UB there exists y e UB such that x\ =y\, and hence (x, y) is 
an edge in G\. This implies maximality. □ 

The last result can be reformulated in terms of «-partite graphs generalizing ll5l : 
Namely, the 1-robustness structures are in one-to-one relation with the «-partite sub- 
graphs of Kd u ...,d n such that every connected component is itself a complete «-partite 
subgraph K eu ^ en with > for all i € [n]. Here, an w-partite subgraph is a graph 
which can be coloured by n colours such that no two vertices with the same colour are 
connected by an edge. 

Unfortunately the nice product form of the maximal 1-robustness structures does 
not generalize to k > 1 : 

Example 13 (Binary three inputs). If n = 3 and d\ = d.2 = di, = 2 and k — 2, then 
the graph G?> is the graph of the cube. For a maximal 1-robustness structure B the set 
X \ UB can be any one of the following: 

• The empty set 

• A set of cardinality 4 corresponding to a plane leaving two connected compo- 
nents of size 2 

• A set of cardinality 4 containing all vertices with the same parity. 

• A set of cardinality 3 cutting off a vertex. 

An example for the last case is 

B := {{(1, 1, 1)}, {(2, 2, 2), (2, 2, 1),(2, 1, 2), (1, 2, 2)}} . 
Only the isolated vertex has a product structure. 

Generically, the smaller k, the easier it is to describe the structure of all ^-robustness 
structures. We have seen above that the cases k = and k = 1 are particularly nice. 
One might expect that all ^-robustness structures are also (k + l)-robustness structures 
for all k. Unfortunately, this is not true in general: 

Example 14. Consider n = 4 binary random variables Xi , . . . , X4. Then 
B := {{(1, 1, 1, 1), (2, 2, 1, 1)}, {(1,2, 2, 2), (2, 1, 2, 2)}} 
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is a maximal 2-robustness structure. Both elements of B are ~ 2 - connec ted, but not 
~3 -connected. 

The following two lemmas relate ^-robustness to / robustness for I > k: 

Lemma 15. Let ft be a k-robustness structure. For every I > k there exists an l- 
robustness structure ft' such that the following holds: For any if e B there exists 
precisely one J/' € B' such that if c J/'. 

Proof. The statements |7]) for k imply the same statements of /, so Ps is a closed subset 
of Pi. Thus P% lies in one irreducible subset Pw of Pj. The statement now follows 
from Lemma [22] □ 

Lemma 16. Assume that d\ — ■ ■ ■ — d n — 2, and let ft be a maximal k-robustness 
structure of binary random variables. Then each B eft is connected as a subset ofG s 
for all s < n — Ik. 

Proof. We can identify elements of X with 01 -strings of length n. Denote I r the string 
1...10...0ofr ones and n — r zeroes in this order. Without loss of generality assume 
that Iq, I i are two elements of B e ft, where k > n - I < s. Let m = and consider 
I m . We want to prove that we can replace B by B U {/„,} and obtain another, coarser k- 
robustness structure. By maximality this will imply that Iq and // are indeed connected 
by a path in G s . 

Otherwise there exists A e ft and x e A such that x and I m agree in at least k 
components. Let a be the number of zeroes in the first m components of x, let b be the 
number of ones in the components from m + 1 to / and let c be the number of ones in 
the last n-l components. Then I m and x disagree in a + b + c < n-k components. On 
the other hand, x and Iq disagree in (m -a) + b + c components, and x and /; disagree in 
a+((l-m)-b)+c < a +(m— b) +c components. Assume that a > b (otherwise exchange 
Iq and /;). Then x and Iq disagree in at most m + c < f|] + n - I = n - < n - k 
components, so A U B is connected, in contradiction to the assumptions. □ 



We refer to O for an introduction to the algebraic terminology that is used in this 
section. 

Let X be a finite set, do > 1 an integer, and denote X = Xq x X. Fix a field R. 
Consider the polynomial ring R = R[p x : x e X] with \X\ unknowns p x indexed by X. 
For all i, j e Xq and all x, y € X let 



For any graph G on X the ideal Iq in R generated by the binomials fJ, for all i, j e Xq 
and all edges (x, y) in G is called the <5f th binomial edge ideal of G over R. This is a 
direct generalization of [6] and [9], where the same ideals have been considered in the 
special case <i - 2. 

Choose a total order > on X (e.g. choose a bijection X = {\X\\). This induces a 
lexicographic monomial order, that will also be denoted by >, via 



6. Generalized binomial edge ideals 



fxy = PixPjy ~ PiyPjx- 



Pix > Pjy 




either i > j, 



i = j and x > y. 
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A Grobner basis for Iq with respect to this order can be constructed using the following 
definitions: 

Definition 17. A path n : x - xq, x\ , . . . , x r = y from x to y in X is called admissible if 

(i) Xk t X[ for k + i, and x < y; 

(ii) for each k = 1, ...,/■- 1 either xu < x or Xk > y; 

(iii) for any proper subset {ji , . . . , y s ] of {x\ , . . . , x r -\ }, the sequence x, y\ , . . . , y s , y 
is not a path. 

A function k : {0, . . . , r} —> [d] is called n-antitone if it satisfies 
(9) x s < x, k(s) > K(t), for all 1 < s, t < r. 

k is strictly n-antitone if it is 7r-antitone and satisfies /c(0) > k{t). 

The notion of 7r-antitonicity also applies to paths which are not necessarily admissi- 
ble. However, since admissible paths are injective (i.e. they only pass at most once at 
each vertex), we may write k{€) in the admissible case, instead of k{s), if € = n(s). 

For any x < y, any admissible path n : x = xq, x\,...,x r - y from x to y and any 
7r-antitone function k associate the monomial 

r-\ 

k=l 

Theorem 18. The set of binomials 

Q - | u K n f^ K ^ : x < y, n is an admissible path in G from x to y, 

iK i k is strictly n-antitone j 

is a reduced Grobner basis oflc with respect to the monomial order introduced above. 

The proof makes use of the following lemma, which explains ^-antitonicity: 

Lemma 19. Let n : xq, . . . , x r be a path in G, and let k : {0, . . . , r] — > [d] be an 
arbitrary function. If k is not n-antitone, then there exists g € Q such that ini<(^) 
divides the monomial u K n - ]~[[=i Pi<(k)x k - 

Proof. Let r : yo, . . . ,y s be a minimal subpath of n with respect to the property that the 
restriction of k to r is not r-antitone. This means that k is To-antitone and r s -antitone, 
where To = yi, . . . ,y s and t s - yo,... ,y s -i- Assume without loss of generality that 
yo < y s , otherwise reverse r. The minimality implies that K(yo) < K(y s ). It follows that 
t is admissible: By minimality, if yo < yk < y s , then Kiy^) > K(y s ) > K(yo) > K(yk), a 
contradiction. Define 

'k(s), ifk = 0, 
K(k) = <K(0), \fk = s, 

K(k), if < k < s. 

Then k is r-antitone, and ini < (w£/ 3 ^' W ^ o) ) divides u*. □ 
Proof of Theorem U8\ The proof is organized in three steps. 

Step 1: Q is a subset oflc- Let n : x — xo, x\, . . . , x r -\,x r = y be an admissible path in 

Mi 
' xy 



G. We show that u^f^f belongs to Iq using induction on r. Clearly the assertion is 
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true if r - 1, so assume r > 1. Let A = {x^ : x k < x) and B - {x c : x c > y}. Then either 
A ± or 5 * 0. 

Suppose A + and set x* = max A. The two paths : x k , x#_i , . . . , xi, jcq = x and 
7T2 : xt, JC/t+i , . . . , x r _i, jc,. = y in G are admissible. Let k\ and /q be the restrictions of k 
to n\ and 7T2. Let a = k{t), b - k(0) and c - K{k). The calculation 

(PbyPax ~ PbxPay)Pcx k 

= (PcxPbx k ~ Pcx k Pbx)Pay ~ (PcxPax k ~ Pcx k Pax)Pby ~ Pcx(Pbx k Pay ~ PbyPaxJ 

implies that u K n f^ lies in the ideal generated by u K n \f XXk , u K n \f XXk and u^f"^. By induc- 
tion it lies in Iq. 

The case B + can be treated similarly. 

Step 2: Q is a Grobner basis of Iq- Let n : xq, . . . , x r and o~ : yo, . . . ,y s be admissible 
paths in G with xo < x r and yo < y s , and let k and /j. be n- and cr-antitone. By 
Buchberger's criterion we need to show that the S -pairs s ;= S M^^y s ) 

reduces to zero. 

If S ^0, then S is a binomial. Write S = S\ — Sz, where Si = ini<(5'). S is 
homogeneous with respect to the multidegrees given by 



If n and a are disjoint, then S = 0, since u^f^ r Xr and u^-fy^ contain different 
variables. The same happens if the intersection of n and cr does not involve the starting 
or end points of n and cr, since in this case S is proportional to the S -pair of the two 
monomials u K n and 

Assume that n and cr meet and that S ± 0. Then S \ and S 2 are monomials, and the 
unknowns p\ x occurring in S 1 and S 2 satisfy x e n U cr. Assume that there are x < y 
such that D x := min{/ e Xq ■ Pi x I S 1 } < max{/ e <Yq : Ay I S 1 1 =: A- Since ^ U cr is 
connected there is an injective path r : zo, ■ ■ ■ ,Z S from x = zo to y = z s in 7rUcr. Choose 
a map /I : {0, . . . , s] such that A(0) = D x , A(s) = D y and p^.(a)a I S 1 for all < a < s. 
Then u'l divides Si, and A is not r-antitone. So we can apply Lemma [191 in order to 
reduce S to a smaller binomial. 

Let S' be the reduction of S modulo Q. If S' + 0, then let S'j = ini<(S')- The 
above argument shows that min{/ € Xo : pt x | S^} > max{/ e <Yo : P/v I S[) f° r ai l ^ < y- 
This property characterizes S j as the unique minimal monomial in 7? with multidegree 
deg(Sj) = deg(S). But since the reduction algorithm turns binomials into binomials, 
S' - S'j is also a monomial of multidegree deg(S), and smaller than deg(S'j). This 
contradiction shows S ' = 0. 

Step 3: Q is reduced. Let n : xq, . . . , x r and cr : yo, . . . , y s be admissible paths in G with 
xo < x r and yo < y s , and let k and be 7r- and cr-antitone. Let u - k{t), v = k(0), w = 
p(s),t = p(0), and suppose that u K n p UXQ p VXr divides either u^p wyo pr ys or u^p wys p tyo . 
Then {xq, . . . ,x r } is a subset of {jo, ■ • . j^}, and = fi(cr~ l (xb)) for < b < r. 




and 
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If xq = yo and x r - y s , then n is a sub-path of cr. By Definition [T7J n equals o~ (up to 
a possible change of direction). Hence u K n f x ^ r x ^ and i^-jf^y] have the same (total) 
degree, hence they agree. 

If xq - yo and x r ± y s , then p VXr divides u^-, and so x r = y t for some t < s such 
that v = p.(t). Then y t = x r > xq = yo, and hence v < /i(0) = k(0) < k{t), in 
contradiction to xo < x r . A similar argument applies if xo + yo and x r = y s . Finally, 
if xo t yo and x r + y s , then p U x Pvx r divides w^-. This implies u = k(0) = K(j) = v, a 
contradiction. □ 

Corollary 20. Iq is a radical ideal. 

Proof. The assertion follows from Theorem [18] the following general fact: A graded 
ideal that has a Grobner basis with square-free initial terms is radical. See the proof 
of (H Corollary 2.2] for the details. □ 

Since Iq is radical, in order to compute the primary decomposition of the ideal it 
is enough to compute the minimal primes. We are mainly interested in the irreducible 
decomposition of the variety Vq of Iq in the case of characteristic zero. While the ba- 
sic arguments remain true for finite base fields there is no relation between the primary 
decomposition of an ideal and the irreducible decomposition of its variety, since the 
irreducible decomposition consists of all closed points in this case. The following defi- 
nition is needed: Two vectors v, w (living in the same R- vector space) are proportional 
whenever v = Aw or w = Av for some A £ R.. A set of vectors is proportional if each 
pair is proportional. Since A = is allowed, proportionality is not transitive: If v and 
w sue proportional and if u and v are proportional, then we can conclude that u and w 
must be proportional only if v + 0. 

We now study the solution variety Vq of Iq, which is a subset of R' v ° x ^. As usual, 
elements of R^ ^ will be denoted with the same symbol p = {pi x )iex a ,xex as the 
unknowns in the polynomial ring R - M.[pi x '■ (i,x) e x <^G- Such a p can be 
written as a do x |,Y|-matrix. Each binomial equation in Iq imposes conditions on 
this matrix saying that certain submatrices have rank 1 . For a fixed edge (x, y) in G 
the equations f'^, = for all i, j e Xo require that the submatrix (pkz)kex ,ze{x,y} has 
rank one. More generally, if K c X is a clique (i.e. a complete subgraph), then the 
submatrix {pkz)keX a ,z&K has rank one. This means that all columns of this submatrix 
are proportional. The columns of p will be denoted by p x , x e X. A point p lies in Vq 
if and only if p x and p y are proportional for all edges (x,y) of G. 

Even if the graph G is connected, not all columns p x must be proportional to each 
other, since proportionality is not a transitive relation. Instead, there are "blocks" of 
columns such that all columns within one block are proportional. 

For any p e R^o^x j et ^ e ^ su bgraph of G induced by supp p := {x e X : p x t 
0}. We have shown: 

• A point p lies in Vq if and only if p x and p y are proportional whenever x,y e 
supp p lie in the same connected component of G p . 

For any subset J/ c X denote Gy the subgraph of G induced by J/. Let Vqj/ be the 
set of all p e R.^° x ^ for which p x - for all x e X \ J/ and for which p x and p y are 
proportional whenever x,y e X lie in the same connected component of Gy. Then 



(10) 
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The sets V G ,y are irreducible algebraic varieties: 

Lemma 21. For any J/ c X the set V G y is the variety of the ideal I G y generated by 
the monomials 

(1 1) p ix for all x e X\if and i e X , 

and the binomials f]L for all i, j e Xq and all x, y 6 J/ that lie in the same connected 
component ofGy. The ideal I G y is prime. 

Proof. The first statement follows from the definition of V G jj. Write y for the ideal 
generated by all monomials (fTTT) . and for any Z £ J/ write for the ideal generated 
by the binomials fjy, with i, j e Xq and x,y e Z- Then „ is obviously prime. Each 
of the 1?^ is a 2 x 2 determinantal ideal. It is a classical (but difficult) result that this 
ideal is the defining ideal of a Segre embedding, and that it is prime (see |[T0l for a 
rather modern proof). The ideal I G y is the sum of the prime ideal 1^ w and the prime 

ideals for all connected components J3 of Gy, and since the defining equations of 
all these ideals involve disjoint sets of unknowns, I G y itself is prime. □ 

The decomposition (TTQb is not the iiTeducible decomposition of V G , because the 
union is redundant. Let Using Lemmal2T1it is easy to remove the redundant 

components: 

Lemma 22. Let J/, !Z £ X- Then V G y contains V G ,z if and only if the following two 
conditions are satisfied: 

• If ' x,y £ Z are connected in Gy, then they are connected in G%. 

Proof. Assume that V G y c Vqx- Then I G y 2 Ig,z- T° r an y x e X \ Z and any 
i £ Xq this implies p, x e I G ,y. On the other hand, LemmaETJshows that the point with 
coordinates 

fl, ifyeJ/, 
lO, else, 

lies in V G y, and hence in Vq,z- This implies x e J/. 

Let i e Z- Choose two linearly independent non-zero vectors v, w e R d °. By 
Lemma |2~T1 the matrix with columns 

v, if y is connected to * in Gy, 

p y = • w, if j 6 if is not connected to x in Gj/, 

0, else, 

is contained in V G y and hence in V G ,z- Therefore, if z is connected to x in Gy, then it 
is connected to x in Gz- 

Conversely, if the two conditions are satisfied, then all defining equations of I G z 
lie in I G ,y. □ 

Theorem 23. The primary decomposition of V G is 

I G = CiyI Gt y, 

where the intersection is over all J/ c X such that the following holds: For any x £ 
X \ y there are edges (x, y), (x, z) in G such that y, z e are not connected in Gy. 



Piy 
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Equivalently, for any x e X \ J/ the induced subgraph Gy U {x] has less connected 
components than Gy. 

Proof. First, assume that R is algebraically closed. By (TTQb and Lemma |2T1 it suffices 
to show that the condition on J/ stated in the theorem characterizes the maximal sets 
Vc,y in the union (TTOl t (with respect to inclusion). This follows from Lemma 1221 

If R is not algebraically closed, then one can argue as follows: By a bino- 
mial ideal has a binomial primary decomposition over some extension field R = 
R[ai, . . . , aid. The algebraic numbers a\, . . . are coefficients of the defining equa- 
tions of the primary components. Let C be the algebraic closure of R. Since the ideals 
ley are defined by pure differences and since the ideals C ® Ioy are the primary com- 
ponents of C <E> Icy in C ® R it follows that the ideals I G y are already the primary 
components of Iq (in other words, the primary decomposition is independent of the 
base field). □ 

Remark (Comparison to O). Theorems[[8]and[23]are generalizations of Theorems 2.1 
and 3.2 from [61. While Theorem 2.1 in (6l was proved with a case by case analysis, 
the proof of Theorem [T8l is much more conceptual. The proof of Theorem l23l relied 
on the irreducible decomposition of the corresponding variety. On the other hand, the 
proof of Theorem 3.2 in [6| directly proves the equality of the two ideals. 
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